m-Bonsai: a Practical Compact Dynamic Trie
نویسندگان
چکیده
We consider the problem of implementing a space-efficient dynamic trie, with an emphasis on good practical performance. For a trie with n nodes with an alphabet of size σ, the informationtheoretic lower bound is n logσ + O(n) bits. The Bonsai data structure is a compact trie proposed by Darragh et al. (Softw., Pract. Exper. 23(3), 1993, pp. 277–291). Its disadvantages include the user having to specify an upper bound M on the trie size in advance (which cannot be changed easily after initalization), a space usage of M log σ + O(M log logM) (which is asymptotically non-optimal for smaller σ or if n ≪ M) and a lack of support for deletions. It supports traversal and update operations in O(1/ǫ) expected time (based on assumptions about the behaviour of hash functions), where ǫ = (M −n)/M and has excellent speed performance in practice. We propose an alternative, m-Bonsai, that addresses the above problems, obtaining a trie that uses (1 + β)n(log σ + O(1)) bits in expectation, and supports traversal and update operations in O(1/β) expected time and O(1/β) amortized expected time, for any user-specified parameter β > 0 (again based on assumptions about the behaviour of hash functions). We give an implementation of m-Bonsai which uses considerably less memory and is slightly faster than the original Bonsai.
منابع مشابه
Improved Practical Compact Dynamic Tries
We consider the problem of implementing a dynamic trie with an emphasis on good practical performance. For a trie with n nodes with an alphabet of size σ, the information-theoretic lower bound is n log σ + O(n) bits. The Bonsai data structure [1] supports trie operations in O(1) expected time (based on assumptions about the behaviour of hash functions). While its practical speed performance is ...
متن کاملBonsai: a Compact Representation of Trees
This paper shows how trees can be stored in a very compact form, called ‘Bonsai’, using hash tables. A method is described that is suitable for large trees that grow monotonically within a predefined maximum size limit. Using it, pointers in any tree can be represented within 6 + log2n bits per node where n is the maximum number of children a node can have. We first describe a general way of ...
متن کاملPractical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries
We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar.
متن کاملCompact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth
Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...
متن کاملFaster Dynamic Compact Tries with Applications to Sparse Suffix Tree Construction and Other String Problems
The dynamic compact trie is a fundamental data structure for a wide range of string processing problems. Jansson, Sadakane, and Sung (LNCS 4855, pp.424-435, FSTTCS 2007) presented the dynamic uncompacted trie data structure of n nodes in O(n log σ) space supporting pattern matching in O((|P |/α)f(n)) time and insert/delete operations in O(f(n)) time, where f(n) = ((log logn)/log log logn) is th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1704.05682 شماره
صفحات -
تاریخ انتشار 2017